Mixed Models

Table of Contents


Main Points

  1. Mixed models are used when you have some type of grouping variable within your data (e.g., departments).

  2. They can also be used when you have a repeated measurement over the same person (e.g., testing).

  3. Mixed models can also incorporate information from both grouping variables and repeated measurements (e.g., yearly job satisfaction across departments).

Packages

We will need the following:


install.packages(c("lme4", "lmerTest", "merTools", "MuMIn"))

Terminology

While the specifics of each model you have learned to this point might take some time to get our heads all the way around, the terminology has largely been pretty clear – no more. You will hear “mixed models”, “mixed effects models”, “hierarchical linear models”, “nested models”, and/or “multilevel models”; these are all slight variations on a common theme. For the sake of our work here, we will keep it at mixed models. Within our mixed model, we have an additional source of cloudiness: fixed and random effects. The random effects don’t pose much of an issue (we will define it later), but fixed effects have 4 different definitions depending upon whom you ask (you will commonly see it used for panel or time-series data when people want to remove the variation of a group or individual). For the sake of simplicity (again), we are going to consider fixed effects as an effect on the individual unit of analysis. This will all start to make sense once we take a look at the models.

Equations

Let’s assume that we are going to try to predict sales for a group of stores:

\[sales = \beta_{intercept} + \beta_{rebate}*Rebate + \epsilon\]

The error term would assumed to be normally distributed with a mean of 0 and some standard deviation:

\[\epsilon \sim \mathscr{N}(0, \sigma) \]

With that, we can take those equations and shuffle them around a bit to capture the data generation process for sales:

\[sales \sim \mathscr{N}(\mu, \sigma)\]

\[\mu = \beta_{intercept} + \beta_{rebate}*Rebate\]

Now with that, we can add the effects for another variable at a higher level (i.e., the effects of store):

\[sales = \beta_{intercept} + \beta_{rebate}*Rebate + (effect_{store} + \epsilon)\]

And look at the error terms:

\[ effect_{store} \sim \mathscr{N}(0, \tau)\]

Where we can say that the effect of store is normally distributed with a mean of 0 and some standard deviation.

And rolling it all together, we get:

\[sales = (\beta_{intercept} + effect_{store}) + \beta_{rebate}*rebate + \epsilon)\]

Why Mixed Models

As mentioned earlier, we will get improved standard errors.

Mixed models don’t get bogged down by large groups (i.e., large groups do not dominate the inference) and the smaller groups do not get buried by the larger groups.

Mixed models will not overfit or underfit when we have repeated samples/measurement. We also get improved estimates for repeated measures when dealing with mixed models.

Our models retain the variation inherent within the data when we used a mixed model.

Visually Explained

That is a pretty clear effect for the number of visits and the total spent.

But let’s look a bit further:

This changes things entirely!

Standard Linear Model

For the sake of conceptual grounding, let’s do some data prep and go back to our standard linear model:


library(data.table)
library(ggplot2)

performanceReview <- fread(
  "http://www.nd.edu/~sberry5/data/performanceReviewData.csv"
  )

lmTest <- lm(performanceScore ~ Manager + age, 
             data = performanceReview)

summary(lmTest)

Call:
lm(formula = performanceScore ~ Manager + age, data = performanceReview)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.3230 -1.7861  0.0972  1.7638  6.5431 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 56.924348   0.373866  152.26   <2e-16 ***
Manager     -5.132527   0.226195  -22.69   <2e-16 ***
age         -0.356581   0.007252  -49.17   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.613 on 531 degrees of freedom
Multiple R-squared:  0.8453,    Adjusted R-squared:  0.8447 
F-statistic:  1451 on 2 and 531 DF,  p-value: < 2.2e-16

ggplot(performanceReview, 
       aes(performanceScore, age, color = as.factor(Manager))) +
  geom_point() + 
  theme_minimal()

We have our standard output here. As before, our intercept is the average score when everything is at 0, we have a categorical variable (Manager) with simple constrasts, and the continuous predictor of age.

Let’s see how some additional information changes our viz:


ggplot(performanceReview, 
       aes(performanceScore, age, color = as.factor(Manager))) +
  geom_point() + 
  facet_wrap( ~ idBuilding) +
  theme_minimal()

Random Intercepts Model

Let’s include the department variable in our model. We are not going to add it as another predictor, but we are going to include it as another level to our model. The lme4 package will make this very easy. We will start with an intercept-only model:


library(lme4)

intMod <- lmer(performanceScore ~  1 + (1|idDepartment), 
               data = performanceReview)

Before we look at the summary for this model, let’s get an idea about what is happening in the syntax. The first part of our formula should look familiar – these are the global estimates (fixed effects) within our model and will behave exactly the same as our standard linear model. We are just specifying an intercept here.

The next part in the parentheses is how we denote our random effect. Whenever you see a 1 included in a formula interface, we can be pretty comfortable that it is in reference to a intercept. The | specifies a grouping. With that information, we might be able to guess that we are specifying a random intercept for each borough.

We should probably check out the summary:


summary(intMod)

Linear mixed model fit by REML ['lmerMod']
Formula: performanceScore ~ 1 + (1 | idDepartment)
   Data: performanceReview

REML criterion at convergence: 3505.3

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.22396 -0.71120 -0.03982  0.74784  2.24396 

Random effects:
 Groups       Name        Variance Std.Dev.
 idDepartment (Intercept)  5.22    2.285   
 Residual                 39.09    6.252   
Number of obs: 534, groups:  idDepartment, 26

Fixed effects:
            Estimate Std. Error t value
(Intercept)  37.7234     0.5266   71.64

We don’t really get anything too interesting here for now. Let’s do explore the random effects components just a little bit. Let’s do the following with our variances:

\[\frac{\text{intercept variance}}{\text{intercept variance} + \text{residual variance}}\]


5.22 / (39.09 + 5.22)

[1] 0.1178064

This is what is known as the intraclass correlation (ICC). It ranges from 0 (no variance between clusters) to 1 (variance between clusters). It can be thought of as how much the proportion of variance in scores is accounted for by the department alone. Computationally the ICC is as follows:

\[\rho = \frac{\tau^2}{\tau^2 + \sigma^2} \]

Where \(\tau^2\) is the population variance between clusters and \(\sigma^2\) is the population variance within clusters.

We can compute \(\sigma^2\) as:

\[\sigma^2 = \frac{\Sigma(n_j - 1)S^2_j}{N-C}\]

Where \(S^2_j\) (the variance within the cluster) is defined as:

\[\frac{\Sigma(y_{ij}-\bar{y}_j)}{(n_j - 1)}\]

where

\(n_j\) = sample size for cluster j

N = total sample size

C = total number of clusters

The calculation of \(\tau^2\) is as follows:

\[\tau^2 = \hat{S}^2_B - \frac{\hat{\sigma}^2}{\tilde{n}}\]

where

\[\hat{S}^2_B = \frac{\Sigma n_j(\bar{y_j - \bar{y}})^2}{\tilde{n}(C-1)}\] where

\(\bar{y}_j\) = mean on response variable for cluster j

\(\bar{y}\) = overall mean on response variable

\[\tilde{n} = \frac{1}{C-1} \begin{bmatrix}N-\frac{\Sigma n^2_j}{N}\end{bmatrix}\]

We could do all of this work by hand and before running the model, but I’ll let you do that on your own time if you really feel the need to do so.

We can add a predictor variable to the model:


riMod <- lmer(performanceScore ~ Manager + age + (1|idDepartment), 
              data = performanceReview)

summary(riMod)

Linear mixed model fit by REML ['lmerMod']
Formula: performanceScore ~ Manager + age + (1 | idDepartment)
   Data: performanceReview

REML criterion at convergence: 2002.6

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.59312 -0.72445 -0.02044  0.71297  2.23267 

Random effects:
 Groups       Name        Variance Std.Dev.
 idDepartment (Intercept) 4.899    2.213   
 Residual                 2.028    1.424   
Number of obs: 534, groups:  idDepartment, 26

Fixed effects:
             Estimate Std. Error t value
(Intercept) 56.931410   0.482595  117.97
Manager     -5.270653   0.125466  -42.01
age         -0.354830   0.004066  -87.28

Correlation of Fixed Effects:
        (Intr) Managr
Manager -0.137       
age     -0.395  0.012

When groups are largely balanced, we would find that our coefficients would be the same (or very close to it).

What should almost always change is our standard errors – by integrating information about the groups, we are getting a better sense of how much uncertainty our model contains at the global average level.

We also see some additional information – this is for our random effects. The standard deviation is telling us how much the score moves around based upon department after getting the information from our fixed effects (i.e., the score can move around nearly 2 whole point from department alone).

We can get a good sense of the random effect estimates:


ranef(riMod)

$idDepartment
   (Intercept)
1   1.06318571
2   1.07377298
3   0.22842693
4   3.25746712
5   3.50057630
6   4.26420843
7   1.67845671
8   0.75113541
9  -0.39817728
10  0.90764199
11 -1.09916574
12 -0.66117892
13  2.25956540
14  1.55198242
15 -3.13673779
16 -1.85889149
17 -4.37484768
18 -4.35611033
19 -1.29756268
20  0.81728901
21 -1.20309868
22  0.64450265
23  0.09077807
24 -0.79003136
25 -0.34199796
26 -2.57118923

with conditional variances for "idDepartment" 

We can also use various bits of information in our output to create different R^2 values.

For the first level of the model, we can calculate the \(R^2\) as:

\[R^2_11 - \frac{\sigma^2_{M1} + \tau^2_{M1}}{\sigma^2_{M0} + \tau^2_{M0}}\]

We can pull that information from our two model outputs:


m1Sigma <- 4.899 # full model random effect intercept

m1Tau <- 2.028 # full model random effect residual

m0Sigma <- 5.22 # null model random effect intercept

m0Tau <- 39.09 # null model random effect residual

1 - ((m1Sigma + m1Tau) / (m0Sigma + m0Tau))

[1] 0.8436696

The fixed effects of manager and age account for nearly 85% of the variation within the scores.

The second level can be calculated as:

\[R^2_2 = 1 - \frac{\sigma^2_{M1} / B + \tau^2_{M1}}{\sigma^2_{M0} / B + \tau^2_{M0}}\]

We see that we added a B into the mix here and it is just the average size of the level 2 units (534 / 26).


level2Mean <- 534 / 26

r2Numerator <- m1Sigma / level2Mean + m1Tau

r2Denominator <- m0Sigma / level2Mean + m0Tau

1 - (r2Numerator / r2Denominator)

[1] 0.9423923

You can also get these values like this:


MuMIn::r.squaredGLMM(riMod)

           R2m       R2c
[1,] 0.8430864 0.9540581

Here we have two values: the marginal R2 (R2m) and the conditional R2 (R2c). You can think of the marginal values as the standard type of R2 – it is the variability explained by the fixed effects part of the model (it is what we have already done above). The conditional R2 is using both fixed and random effects. So in this case, we would see that we are accounting for about 11% of the variation in wave alone

Let’s take a look at the effects:


library(sjPlot)

plot_model(riMod, type = "re") + 
  theme_minimal()

For the plot above, the easiest way to think about it is that the values are effects for each individual random effect (i.e., each departments random intercept). Since we are just dealing with intercepts right now, they are the deviations from the fixed intercept. The intercept for the model is ~57 and the random effect for department 6 appears to be about 4.25. If we wanted to predict scores for someone from department 6, we would take -57 + 4.25 for the intercept portion of the model (the same would go for any random slopes in the model). In the end, it is showing how much the intercept shifts from department to department, and some department have a positive effect beyond the average and others have a negative effect (department 17, for instance, is well below the average).

We can also plot simulated random effect ranges for each of the random effect groups. We want to pay attention to those that are highlighted with black (i.e., the range does not cross the red line at 0).


library(merTools)

plotREsim(REsim(riMod), labs = TRUE)

In examining the plot, we can see a very broad range of department effects on scores, with most of them being significant (noted by the black dots).

Going back to the output, did you notice anything missing: p-values! Estimating p-values in a mixed model is exceedingly difficult because of varying group sizes, complete sample n, and how those relate to reference distributions. If you need something that will help, you can get confidence intervals in the same way that you would anything else:


confint(riMod)

                2.5 %     97.5 %
.sig01       1.679106  2.9393080
.sigma       1.338259  1.5134553
(Intercept) 55.976723 57.8862542
Manager     -5.516426 -5.0246447
age         -0.362799 -0.3468642

Those confidence intervals are interpreted how they always are, but we see two things that don’t look familiar: .sig01 and .sigma. These are related to our random effects, where .sig01 relates to the random intercept

If you really want to see p-values, you can get them easily:


riModP <- lmerTest::lmer(performanceScore ~ Manager + age + 
                           (1|idDepartment), 
                         data = performanceReview)

summary(riModP)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: performanceScore ~ Manager + age + (1 | idDepartment)
   Data: performanceReview

REML criterion at convergence: 2002.6

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.59312 -0.72445 -0.02044  0.71297  2.23267 

Random effects:
 Groups       Name        Variance Std.Dev.
 idDepartment (Intercept) 4.899    2.213   
 Residual                 2.028    1.424   
Number of obs: 534, groups:  idDepartment, 26

Fixed effects:
              Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)  56.931410   0.482595  36.539055  117.97   <2e-16 ***
Manager      -5.270653   0.125466 506.793426  -42.01   <2e-16 ***
age          -0.354830   0.004066 507.492183  -87.28   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
        (Intr) Managr
Manager -0.137       
age     -0.395  0.012

NOTE: I would never load the lmerTest package, but would attach with colons! If you load it, you will find that it masks things from lme4 that you don’t want to have masked (i.e., lmer) and they are not equivalent objects!

Getting predicted values from our mixed model is no different then getting them from any other model:


mixedPred <- predict(riMod)

slimPred <- predict(lmTest)

allPred <- cbind(actual = performanceReview$performanceScore, 
      mixed = mixedPred, 
      slim = slimPred)

head(allPred, 20)

     actual    mixed     slim
1  40.31653 37.59082 36.58406
2  41.99031 41.55702 40.40570
3  38.42164 39.74343 38.74729
4  39.18218 39.08405 37.92053
5  38.11899 38.73618 37.57095
6  31.10169 29.71230 28.66668
7  31.15377 33.45472 32.42756
8  34.37665 36.32163 35.14449
9  34.16259 33.44172 32.41449
10 36.89173 34.56446 33.37865
11 33.55405 33.86593 32.84080
12 44.59988 42.11135 40.96276
13 36.72473 37.38279 36.37500
14 31.56297 32.80710 31.61262
15 42.04933 41.50969 40.35814
16 40.66954 39.73064 38.73443
17 51.75952 50.39556 49.28783
18 43.14972 43.28860 42.30994
19 48.41701 50.43617 49.32864
20 25.17246 27.30469 26.24720

While there were cases where the standard linear model did a slightly better job, our mixed model generally did better. Plotting these makes it even more clear:


plot(allPred[, "actual"], allPred[, "slim"])


plot(allPred[, "actual"], allPred[, "mixed"])

Let’s add some more information to our model. As we dive into our data, we will notice that we also have a building variable. We can add this additional grouping into our model:


clusterMod <- lmer(performanceScore ~ Manager + age + 
                     (1|idDepartment) + (1|idBuilding), 
                   data = performanceReview)

This is often called a cross-classified model.


summary(clusterMod)

Linear mixed model fit by REML ['lmerMod']
Formula: 
performanceScore ~ Manager + age + (1 | idDepartment) + (1 |  
    idBuilding)
   Data: performanceReview

REML criterion at convergence: 1986.7

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.56000 -0.73269 -0.03636  0.71645  2.26861 

Random effects:
 Groups       Name        Variance Std.Dev.
 idDepartment (Intercept) 1.296    1.138   
 idBuilding   (Intercept) 4.024    2.006   
 Residual                 2.028    1.424   
Number of obs: 534, groups:  idDepartment, 26; idBuilding, 8

Fixed effects:
             Estimate Std. Error t value
(Intercept) 57.091966   0.777901   73.39
Manager     -5.271945   0.125411  -42.04
age         -0.354929   0.004059  -87.44

Correlation of Fixed Effects:
        (Intr) Managr
Manager -0.085       
age     -0.247  0.012

plotREsim(REsim(clusterMod), labs = TRUE)

We should also see if our predictions improved:


clusterPredict <- predict(clusterMod)

allPred <- cbind(actual = performanceReview$performanceScore, 
           clustered = clusterPredict,
           mixed = mixedPred, 
           slim = slimPred)

head(allPred, 20)

     actual clustered    mixed     slim
1  40.31653  37.65219 37.59082 36.58406
2  41.99031  41.61932 41.55702 40.40570
3  38.42164  39.80540 39.74343 38.74729
4  39.18218  39.14566 39.08405 37.92053
5  38.11899  38.79769 38.73618 37.57095
6  31.10169  29.77148 29.71230 28.66668
7  31.15377  33.51494 33.45472 32.42756
8  34.37665  36.38247 36.32163 35.14449
9  34.16259  33.50193 33.44172 32.41449
10 36.89173  34.62481 34.56446 33.37865
11 33.55405  33.92626 33.86593 32.84080
12 44.59988  42.17380 42.11135 40.96276
13 36.72473  37.44410 37.38279 36.37500
14 31.56297  32.86696 32.80710 31.61262
15 42.04933  41.57198 41.50969 40.35814
16 40.66954  39.79261 39.73064 38.73443
17 51.75952  50.46033 50.39556 49.28783
18 43.14972  43.35156 43.28860 42.30994
19 48.41701  50.50094 50.43617 49.32864
20 25.17246  27.36320 27.30469 26.24720

For many observations, our predictions definitely go tighter, but there isn’t a big difference between our two mixed models.


plot(allPred[, "actual"], allPred[, "slim"])


plot(allPred[, "actual"], allPred[, "mixed"])


plot(allPred[, "actual"], allPred[, "clustered"])

Hierarchical Models

Hierarchical models are a slight variation on the models that we have just seen. In these models, we have groups nested within other groups. We know that we have a department variable and a building variable. One source of inquiry would be to incorporate the nesting of this information.

The model set-up is just different enough to cause some potential confusion here. Within the parentheses, we have our intercept as before, but we are also saying that we are looking at the departments nested within buildings (it follows an A/B constructions).


hierMod <- lmer(performanceScore ~ Manager + age +  
                  (1|idBuilding/idDepartment), 
                data = performanceReview)

summary(hierMod)

Linear mixed model fit by REML ['lmerMod']
Formula: 
performanceScore ~ Manager + age + (1 | idBuilding/idDepartment)
   Data: performanceReview

REML criterion at convergence: 1986.7

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.56000 -0.73269 -0.03636  0.71645  2.26861 

Random effects:
 Groups                  Name        Variance Std.Dev.
 idDepartment:idBuilding (Intercept) 1.296    1.138   
 idBuilding              (Intercept) 4.024    2.006   
 Residual                            2.028    1.424   
Number of obs: 534, groups:  
idDepartment:idBuilding, 26; idBuilding, 8

Fixed effects:
             Estimate Std. Error t value
(Intercept) 57.091966   0.777901   73.39
Manager     -5.271945   0.125411  -42.04
age         -0.354929   0.004059  -87.44

Correlation of Fixed Effects:
        (Intr) Managr
Manager -0.085       
age     -0.247  0.012

We have our fixed effect for observation and we have individual intercepts for both idBuilding and department nested within building. Looking at the standard deviation of our random effects, we can see that the scores bounce around a little over 1 point from from department to department within a building.

We can also look at our effect ranges:


plotREsim(REsim(hierMod), labs = TRUE)

And checking our predictions again:


hierModPredict <- predict(hierMod)

allPred <- cbind(actual = performanceReview$performanceScore, 
                 hierMod = hierModPredict,
                 clustered = clusterPredict,
                 mixed = mixedPred, 
                 slim = slimPred)

head(allPred, 20)

     actual  hierMod clustered    mixed     slim
1  40.31653 37.65219  37.65219 37.59082 36.58406
2  41.99031 41.61932  41.61932 41.55702 40.40570
3  38.42164 39.80540  39.80540 39.74343 38.74729
4  39.18218 39.14566  39.14566 39.08405 37.92053
5  38.11899 38.79769  38.79769 38.73618 37.57095
6  31.10169 29.77148  29.77148 29.71230 28.66668
7  31.15377 33.51494  33.51494 33.45472 32.42756
8  34.37665 36.38247  36.38247 36.32163 35.14449
9  34.16259 33.50193  33.50193 33.44172 32.41449
10 36.89173 34.62481  34.62481 34.56446 33.37865
11 33.55405 33.92626  33.92626 33.86593 32.84080
12 44.59988 42.17380  42.17380 42.11135 40.96276
13 36.72473 37.44410  37.44410 37.38279 36.37500
14 31.56297 32.86696  32.86696 32.80710 31.61262
15 42.04933 41.57198  41.57198 41.50969 40.35814
16 40.66954 39.79261  39.79261 39.73064 38.73443
17 51.75952 50.46033  50.46033 50.39556 49.28783
18 43.14972 43.35156  43.35156 43.28860 42.30994
19 48.41701 50.50094  50.50094 50.43617 49.32864
20 25.17246 27.36320  27.36320 27.30469 26.24720

Still, not much of a difference bewteen our mixed effects models:


plot(allPred[, "actual"], allPred[, "slim"])


plot(allPred[, "actual"], allPred[, "mixed"])


plot(allPred[, "actual"], allPred[, "clustered"])


plot(allPred[, "actual"], allPred[, "hierMod"])

For these predictions, it really does not look like anything changed at all. You might not see massive changes in fit when moving from model to model. The more important question, though, is your model reflecting the reality of your data?

Random Slopes

Now that we have seen random intercepts and hierarchical models, we can add one final piece: random slopes. In the following model, we will specify a random intercept (recall, it is the 1 within our parenthesis) and a random slope (we are prefixing our grouping variable with a slope of interest). Not only will this model allow the intercept to vary between groups, but it will also allow the slope to vary between groups.


randomSlopes <- lmer(performanceScore ~ Manager + age + 
                       (age|idDepartment), 
                     data = performanceReview)

summary(randomSlopes)

Linear mixed model fit by REML ['lmerMod']
Formula: performanceScore ~ Manager + age + (age | idDepartment)
   Data: performanceReview

REML criterion at convergence: 2002.6

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.59430 -0.72745 -0.02409  0.71186  2.22131 

Random effects:
 Groups       Name        Variance  Std.Dev.  Corr 
 idDepartment (Intercept) 5.057e+00 2.2487018      
              age         5.563e-07 0.0007459 -1.00
 Residual                 2.028e+00 1.4241169      
Number of obs: 534, groups:  idDepartment, 26

Fixed effects:
             Estimate Std. Error t value
(Intercept) 56.933336   0.488854  116.46
Manager     -5.271157   0.125458  -42.02
age         -0.354869   0.004069  -87.22

Correlation of Fixed Effects:
        (Intr) Managr
Manager -0.136       
age     -0.422  0.012
convergence code: 0
boundary (singular) fit: see ?isSingular

Nothing changes with regard to our fixed effects, but we get some added information in our random effects. The random intercept standard deviation for department is telling us the amount that the scores bounce around from place to place and the age variance is telling us how much variability there is within the slope between locations.

We can look at it graphically:


performanceReview$idDepartment <- as.factor(performanceReview$idDepartment)

ggplot(performanceReview, aes(performanceScore, age, 
                              group = idDepartment, color = idDepartment)) +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal()

When looking at this visualization, it becomes clear that there really is not too much difference in the slopes of age between departments here.

We can also fit that random effects model to our hierarchical model:


observationModSlope <- lmer(performanceScore ~ Manager + age + 
                              (age|idBuilding/idDepartment), 
                            data = performanceReview)

summary(observationModSlope)

Linear mixed model fit by REML ['lmerMod']
Formula: 
performanceScore ~ Manager + age + (age | idBuilding/idDepartment)
   Data: performanceReview

REML criterion at convergence: 1987.9

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.5713 -0.7514 -0.0326  0.7007  2.2012 

Random effects:
 Groups                  Name        Variance  Std.Dev.  Corr 
 idDepartment:idBuilding (Intercept) 2.440e+00 1.5621734      
                         age         2.908e-05 0.0053929 -1.00
 idBuilding              (Intercept) 5.608e+00 2.3682083      
                         age         5.576e-08 0.0002361 -0.93
 Residual                            2.003e+00 1.4154356      
Number of obs: 534, groups:  
idDepartment:idBuilding, 26; idBuilding, 8

Fixed effects:
             Estimate Std. Error t value
(Intercept) 57.115452   0.920617   62.04
Manager     -5.269283   0.124735  -42.24
age         -0.355020   0.004181  -84.92

Correlation of Fixed Effects:
        (Intr) Managr
Manager -0.071       
age     -0.300  0.012
convergence code: 0
boundary (singular) fit: see ?isSingular

The Alternative

If you don’t want to go the mixed model route, the natural thing to do would be to conduct a regression for each individual group. Here is what that might look like:


library(purrr)

indPredictions <- performanceReview %>% 
  split(., f = .$idDepartment) %>% 
  map_df(~ data.frame(predicted = predict(lm(performanceScore ~ Manager + age, 
                                             data = .x)), 
                      score = .x$performanceScore)) 

plot(indPredictions$score, indPredictions$predicted)

Longitudinal Data

If we continue to look at our data (and with some knowledge about how NYC does health inspections), we will see that restaurants are rated yearly – let’s use this information in our model. We won’t worry about distance anymore, because now we have a few competing hypotheses. We could imagine two different ways that the works: one in which a restaurant’s score improves as observations increase (it takes some time for the owner to get his staff fully up to speed) or one in which the score decreases as the observations increase (the “shine has worn off the penny”).

Let’s do a bit of data processing first.


library(dplyr)

healthData <- fread(
  "https://www.nd.edu/~sberry5/data/healthViolationsDistances.csv"
)

healthData <- healthData[, `:=`(BORO = as.factor(BORO), 
                              cuisine = as.factor(`CUISINE DESCRIPTION`), 
                              distanceCentered = dohDistanceMeter - 
                                mean(dohDistanceMeter))]

healthData$nameLocation <- paste(healthData$DBA, 
                                 healthData$BUILDING, 
                                 sep = "_")

healthData$`GRADE DATE` <- lubridate::mdy(healthData$`GRADE DATE`)

healthData <- setkey(healthData, nameLocation)

healthData <- healthData[order(`GRADE DATE`), observation := 1:.N, 
                         by = nameLocation]

timeReviewed <- healthData[, 
                           .(n = .N), 
                           by = nameLocation][n > 10]

reviewedRest <- healthData[which(healthData$nameLocation %in%
                                   timeReviewed$nameLocation), ]

Now we can fit a mixed model, with each location getting its own intercept.


observationModInt <- lmer(SCORE ~ observation + 
                            (1|nameLocation), 
                          data = reviewedRest)

In this model, we have a fixed effect for observation and we are allowing each location to have it’s own random intercept. We have essentially created a model that will deal with the repeated measures for each of the locations.


plottingData <- head(reviewedRest[order(nameLocation, observation)], 350)

ggplot(plottingData, aes(observation, SCORE, group = nameLocation)) +
  geom_point() +
  geom_smooth(method = "lm") +
  facet_wrap( ~ nameLocation) +
  theme_minimal()


summary(observationModInt)

Linear mixed model fit by REML ['lmerMod']
Formula: SCORE ~ observation + (1 | nameLocation)
   Data: reviewedRest

REML criterion at convergence: 345661

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-5.0067 -0.5765 -0.1311  0.4606  6.6361 

Random effects:
 Groups       Name        Variance Std.Dev.
 nameLocation (Intercept)  29.46    5.428  
 Residual                 125.82   11.217  
Number of obs: 44622, groups:  nameLocation, 1750

Fixed effects:
             Estimate Std. Error t value
(Intercept) 13.948018   0.159359   87.53
observation  0.461932   0.005331   86.65

Correlation of Fixed Effects:
            (Intr)
observation -0.455

Our fixed effect here would indicate that we have a slight increase in scores as our observations increase, but we can also see that scores will bounce around about 5 points on average by location alone.


29.46 / (29.46 + 125.82)

[1] 0.1897218

We see that the location alone accounts for a substantial chunk of the variance in health scores.


MuMIn::r.squaredGLMM(observationModInt)

           R2m       R2c
[1,] 0.1618007 0.3208427

Adding Random Slopes


observationModSlope <- lmer(SCORE ~ observation + 
                                    (1 + observation|nameLocation), 
                                  data = reviewedRest)

summary(observationModSlope)

Linear mixed model fit by REML ['lmerMod']
Formula: SCORE ~ observation + (1 + observation | nameLocation)
   Data: reviewedRest

REML criterion at convergence: 342130.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-5.3899 -0.5413 -0.0949  0.4448  7.6038 

Random effects:
 Groups       Name        Variance Std.Dev. Corr 
 nameLocation (Intercept)  40.3156  6.3495       
              observation   0.2547  0.5046  -0.56
 Residual                 109.6190 10.4699       
Number of obs: 44622, groups:  nameLocation, 1750

Fixed effects:
            Estimate Std. Error t value
(Intercept) 12.42495    0.18418   67.46
observation  0.63182    0.01486   42.51

Correlation of Fixed Effects:
            (Intr)
observation -0.648
convergence code: 0
Model failed to converge with max|grad| = 0.00375371 (tol = 0.002, component 1)
Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?

The observation variance is telling us how much variability there is within the slope between locations

If we use the predicted values of our model, we can see what our results will look like over the observations:


reviewedRest$mixedPrediction  <-  predict(observationModSlope)

reviewedRest$lmPrediction <- predict(lm(SCORE ~ observation, 
                                        data = reviewedRest))
  
ggplot() + 
  geom_line(mapping = aes(observation, mixedPrediction, group = nameLocation), 
            data = reviewedRest, alpha = .25) + 
  geom_smooth(mapping = aes(x = observation, y = lmPrediction), 
              data = reviewedRest, method = "lm", 
              color = "red") +
  theme_minimal()

Amazing! We can really see the varying intercepts for each restaurant and we can see how the slopes are completely different (some are similar, but we have a completely mixed bag of directions and magnitudes).

It looks like the standard regression line would do okay for many of our locations, but we can see that it would do poorly with many others. This should help to provide an illustration of just how flexible and powerful mixed model can be in your hands.

Extensions

No matter what type of model you want to run, there is a mixed model extension for it. Any distribution can be used with the glmer function, nonlinear models can be fit with nlme from the nlme, and generalized additive models can be fit with gamm from mgcv.